Background and objective: Acute Lymphoblastic Leukaemia (ALL) is the most common childhood cancer, with B lineage ALL accounting for approximately 85% of cases. Minimal residual disease (MRD) levels during induction therapy are important prognostic biomarkers that allow patient stratification dependent on their risk of relapse. MRD is quantified by molecular analyses of antigen receptor gene rearrangements or by flow cytometric evaluation of aberrant immunophenotypes. The latter method is increasingly being used worldwide, due to its reduced cost, its provision of predictive biomarkers for antigen therapies and the inclusion of both MRD methodologies in contemporary clinical trials. However, accurate detection of MRD by flow cytometry is complex, time-consuming and relies heavily on user training and expertise.

Machine learning has been evolving in the field, with the aim of superseding manual analysis of Flow MRD data. This study aimed to test the ability of the unsupervised machine learning dimensional reduction algorithm, Uniform Manifold Approximation Projection, (UMAP) to visualise and accurately quantify 'MRD islands' from normal cell populations with minimal human intervention.

Methods: Flow cytometric data from bone marrow samples from children entered on the UKALL2011 clinical trial were used in the study. For B lineage ALL, the protocol included the identification of a leukaemia associated immunophenotype at diagnosis using 2 antibody combinations and 'on treatment' samples were then ran with the most appropriate antibody tube. In this study, we used data from Euro tube 1 which included the antibodies, CD20 FITC, CD38 PE, CD34 PerCP, CD10 PE-Cy7, CD19 APC, CD45 APC-Cy7 and the nuclear dye, Syto 41, tagged with Pacific Blue. FCS files were uploaded to Cytobank and pre-gated for CD19 positive cells prior to running the UMAP algorithm. Each data set was ran alongside data from 10 normal marrows stained with Euro tube 1.

Results: Initially, a selection of samples with varying MRD levels (n=6) were ran using UMAP with a range of minimum distance and number of neighbours parameters. A minimum distance of 0.1 and number of neighbours of 13 was the optimal UMAP settings to produce a very clear separation of MRD islands from healthy cells populations across all samples. (Figure 1). These setting were then used for 341 B-ALL MRD samples at different time points day 8 (n=116), day 15 (n=32), days 29 (n=118), weeks 6-20 (n= 75). MRD levels from UMAP were compared to those generated by the gold standard sequential gating. Excellent concordance was seen, as demonstrated by an R2 value of 0.99 (linear correlation). For 'on treatment samples', UMAP showed excellent accuracy with sensitivity and specificity values of 93% and 100%, respectively. Positive predictive values were 100% while negative predictive values were 94% due to UMAP analysis failing to identify an MRD island in 11 'on treatment' samples, 9 at day 8 and 2 at day 28. The average UMAP run time was less than 2 minutes.

Conclusion: Detection of MRD using the UMAP algorithm showed an excellent correlation with the gold standard sequential gating technique. False negatives were due to overlap of antigen expression with normal cells and/or low ALL cell numbers. Current Flow MRD methodologies use 8 or more antibodies and higher target cell numbers, which may reduce the false negative UMAP call rate. In summary, UMAP has promising ability to simplify the downstream data analyses of Flow MRD, overcoming the extensive training needed, human subjectivity and will be time and thus cost saving.

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution